5.3 Feature-based Approach with BERT

feature-based approachについて、BERTの原論文に則さない引用がされている印象なので、読んでみた

続く論文でこの主張は崩れるのだろうか？

論文の主張：BERTはfine-tuningもfeature-based approachも両方で有効

BERT is effective for both fine-tuning and feature-based approaches

All of the BERT results presented so far have used the fine-tuning approach

5.3でfeature-based approachについて検討する

この2つのapproachについては1（積ん読）で整理している

feature-based approach, where fixed features are extracted from the pre-trained model

「事前学習モデルから抽出した、固定の特徴量を使う」

CLSや各単語の平均をイメージ

feature-based approachのadvantage 2点

not all tasks can be easily represented by a Transformer encoder architecture,

「すべてのタスクがTransformer-encoderアーキテクチャでたやすく表現できるとは限らない」

（Transformerを使わない）タスク固有のモデルのアーキテクチャがありうる

major computational benefit to pre-compute an expensive representation of the training data once

「訓練データから高コストな表現を事前に一度だけ計算するという計算の観点からの主要な恩恵」

（作った特徴量をもとにcheaperモデルでたくさんの実験を行うと続く）

fine-tuningの実験が重いことを踏まえて言っているのだと思う

NERのタスク（CoNLL-2003）でBERT BASEから「Concat Last Four Hidden」したところ、DevのF1がfine tuningを0.3 behindしただけ

これが両方で有効の主張の根拠

Appendix C に詳細ありそう（積ん読）